2019-04-27
August 2017: Advanced Microdevices (AMD) x86 Core “Zen” architecture released
Ryzen retail, Threadripper High-Performance, and EPIC Server product lines.
Similar Performance as Intel, but at a deep discount.
Twice the core counts for less moneyOr more CPU for the same moneyZEN product lines are gaining acceptance among Graphics Designers, Gamers, and Data ProfessionalsUsing the bencharkme package by Colin Gillespie.
library(benchmarkme)
Contains functions which run matrix algebra computations on random data.
Matrix algebra computations are the core of statistical/Machine Learning models.
Contains crowd-sourced benchmarks from other useRs for comparison.
Benchmarks based heavily on the R script by Simon Urbanek & Doug Bates:
3,500,000 Fibonacci numbers calculation (vector calc).
Creation of a 3500x3500 Hilbert matrix (matrix calc).
Grand common divisors of 1,000,000 pairs (recursion).
Creation of a 1600x1600 Toeplitz matrix (loops).
Escoufier’s method on a 60x60 matrix (mixed)
Creation, transpose., deformation of a 2500x2500 matrix.
2500x2500 normal distributed random matrix ^1000.
Sorting of 7,000,000 random values.
2500x2500 cross-product matrix (b = a’ * a)
Linear regression over a 3000x3000 matrix.
FFT over 2,500,000 random values.
Eigenvalues of a 640x640 random matrix.
Determinant of a 2500x2500 random matrix.
Cholesky decomposition of a 3000x3000 matrix.
Inverse of a 1600x1600 random matrix.
## You are ranked 12 out of 163 machines.
## You are ranked 4 out of 162 machines.
## You are ranked 10 out of 162 machines.
## You are ranked 2 out of 5 machines.
## You are ranked 1 out of 5 machines.
## You are ranked 4 out of 5 machines.
General Programming Benchmarks are excellent!
Linear Algebra Calculations are excellent!
Linear Algebra Functions lagging severely when run in Parallel!??
The goal of this entire build was to run models in parallel, so this is no good!
base-R interfaces with BLAS (Basic Linear Algebra Subprograms) routines that provide standard building blocks for performing linear algebra operations.
scalar multiplication
dot products
linear combinations
matrix operations
Written in both C and FORTRAN
Papers and History here: http://www.netlib.org/blas/
Let's Try another BLAS!
## You are ranked 2 out of 5 machines.
## You are ranked 2 out of 7 machines.
Poking around the internet and researching various BLAS libraries led me to BLIS and the FLAME project!
BLIS/libFlame are high performance dense linear algebra libraries, each addressing a layer in the linear algebra software stack.
Primarily developed and maintained by individuals in the Science of High-Performance Computing (SHPC) group in the Institute for Computational Engineering and Sciences at The University of Texas at Austin.
BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, enable optimized implementations of most of its commonly used and computationally intensive operations. Build BLIS from source on Github here: https://github.com/flame/blis
libFLAME is a high performance dense linear algebra library that is the result of the FLAME methodology for systematically developing dense linear algebra libraries. The FLAME methodology is radically different from the LINPACK/LAPACK approach that dates back to the 1970s, but is backwards compatible with them. Build libFLAME from source on Github here: https://github.com/flame/libflame/
The best part is: They really ROCK!
## You are ranked 1 out of 5 machines.
## You are ranked 1 out of 7 machines.
BLIS/libFLAME places \(1st\) of 7 with a time of \(10.66\) seconds, beats the next submission by a factor of 5, and the last by a factor of 20.
No Intel submissions with this many cores to benchmark against.
| time | cpu | ram | sysname | release | cores |
|---|---|---|---|---|---|
| 10.663 | AMD Ryzen Threadripper 1950X 16-Core Processor | NA | Linux | 4.19.0-041900-generic | 16 |
| 50.583 | AMD Ryzen Threadripper 1950X 16-Core Processor | NA | Windows | >= 8 x64 | 16 |
| 159.536 | AMD Ryzen Threadripper 1950X 16-Core Processor | 33.664 | Linux | 4.19.9-041909-generic | 16 |
| 160.229 | AMD Ryzen Threadripper 1950X 16-Core Processor | 33.664 | Linux | 4.19.9-041909-generic | 16 |
| 164.089 | AMD Ryzen Threadripper 1950X 16-Core Processor | 33.664 | Linux | 4.19.9-041909-generic | 16 |
| 181.100 | AMD Ryzen Threadripper 1950X 16-Core Processor | 33.664 | Linux | 4.19.9-041909-generic | 16 |
| 205.805 | AMD Ryzen Threadripper 1950X 16-Core Processor | 33.664 | Linux | 4.19.9-041909-generic | 16 |
On a single core, base-R BLAS is somewhat faster by 0.49 seconds, a factor of 1.37.
BLAS Single 1.34 seconds.
BLIS Single 1.83 seconds.
But on 8 cores, BLIS becomes much faster by 40.41 seconds, a factor of 7.83.
BLAS 8-core 46.32 seconds.
BLIS 8-core 5.91 seconds.
On 16 cores, BLIS is even faster yet by 95.02 seconds, a factor of 9.91.
BLAS 16-core 105.69 seconds.
BLIS 16-core 10.66 seconds.
Dirk Eddelbuettel: simple scripts to switch BLAS/LAPACK implementations http://dirk.eddelbuettel.com/blog/2018/04/15/#018_mkl_for_debian_ubuntu
Debian/Ubuntu Wiki: Implementations alternative versions of BLAS and LAPACK https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
U Texas Science of High-Performance Computing Group http://shpc.ices.utexas.edu/software.html
BLIS Github: https://github.com/flame/blis
libFLAME Github: https://github.com/flame/libflame/